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Evaluating Structural Equation Models for Categorical Outcomes: 


A New Test Statistic and a Practical Challenge of Interpretation 


Abstract 
This research is concerned with two topics in assessing model fit for categorical data analysis. 
The first topic involves the application of a limited-information overall test, introduced in the 
item response theory literature, to Structural Equation Modeling (SEM) of categorical outcome 
variables. Most popular SEM test statistics assess how well the model reproduces estimated 
polychoric correlations. In contrast, limited-information test statistics assess how well the 
underlying categorical data are reproduced. Here, the recently introduced C, statistic of Cai and 
Monroe (2014) is applied. The second topic concerns how the Root Mean Square Error of 
Approximation (RMSEA) fit index can be affected by the number of categories in the outcome 
variable. This relationship creates challenges for interpreting RMSEA. While the two topics 
initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based 
on an overall test statistic, such as C,. The results are illustrated with an empirical application to 


data from a large-scale educational survey. 
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1 Introduction 

This research concerns two distinct but related topics in assessing the fit of latent 
variable models for ordered categorical data. The first topic is the application of the limited- 
information overall test statistic C2 (Cai & Monroe, 2014) to Structural Equation Modeling 
(SEM). The second topic is how the Root Mean Square Error of Approximation index (RMSEA; 
Steiger & Lind, 1980) is affected by the number of categories in the outcome variable. An 
important connection between the two topics is that RMSEA is based on non-centrality 
(population lack of fit) estimated from an overall goodness-of-fit (GOF) test statistic, such as C. 
That is, RMSEA also depends on the choice of underlying overall test statistic, since different 
test statistics lead to different manifestations of non-centrality. 

To appreciate the motivation for the application of C, it is helpful to consider the 
testing of structural models for continuous data. In this case, a sample covariance matrix 
summarizes the continuous data. Then, following estimation, a test statistic is formed that 
measures how well the structural model reproduces the sample covariance matrix. Depending 
on the estimation approach, a moment correction (e.g., Satorra & Bentler, 1994; Asparouhov & 
Muthén, 2010) can be applied to the test statistic so that it approximately follows a chi-square 
distribution. 

Currently, in many popular SEM software packages, the standard procedure for 
estimating structural models for ordinal variables is the multistage estimator (e.g., Muthén, 
1984). With this estimator, a polychoric correlation matrix is estimated from the categorical 
data. Then, typically, testing the structural model proceeds as in the continuous case. More 
specifically, a test statistic is formed that measures how well the structural model reproduces 
the estimated polychoric correlation matrix. Also, a moment correction is applied to the test 
statistic. 

While the two procedures just described are quite similar, a fundamental distinction 
exists. As noted in Muthén (1993), unlike the sample covariances of continuous variables, the 
estimated polychoric correlations of categorical variables are model-based. Specifically, in 
practice, it is assumed that the observed categorical data arise from discretizing a multivariate 


normal density. Given this additional stage of estimation, it is arguably necessary to test the 


structural model directly against the observed categorical data. This can be accomplished using 
limited-information test statistics, such as C>. 

The C, statistic is among a number of limited-information tests that have been 
developed recently (e.g., Maydeu-Olivares & Joe, 2006; Cai & Hansen, 2013) for models of 
categorical data. For n observed categorical variables, the data can be organized in an n-way 
contingency table. While full-information tests, such as Pearson’s X a depend on the entire n- 
way table, limited-information tests are “limited” in the sense that they depend on some subset 
of lower-order marginal tables. For Cz, the subscript denotes the use of marginal tables up to 
the second-order (i.e., first- and second-order). In comparison to full-information tests, limited- 
information tests have two main advantages: they are better-calibrated (Maydeu-Olivares & Joe, 
2006) and potentially more powerful (Joe & Maydeu-Olivares, 2010). These advantages are 
more pronounced for sparse contingency tables, which are routinely encountered in 
applications of SEM to empirical data in the social and behavioral sciences (Bartholomew & 
Tzamourani, 1999). 

While the limited-information testing methodology has been primarily applied to Item 
Response Theory (IRT) models, the methodology has also been applied to SEM. In an early 
application of limited-information tests, Maydeu-Olivares (2006) proposed a quadratic form in 
second-order residuals for this purpose. However, more recent research on the limited- 
information methodology (e.g., Maydeu-Olivares & Joe, 2006) has yielded tests that are 
practically and theoretically more appealing. One such test statistic is C,. As discussed in Cai 
and Monroe (2014), C2 is well-calibrated under a variety of conditions, such as second-order 
marginal table sparseness, and can be computed for models with relatively few outcome 
variables and relatively many ordinal categories. Further, in comparison to other limited- 
information test statistics, C, can be substantially more powerful in detecting model 
misspecification (Cai & Monroe, 2014). The first contribution of this research, then, is to apply 
C, to SEM of ordered categorical data, specifically in the context of multistage estimation. This 
context also provides an opportunity to compare a limited-information test (i-e., Cz) to a 


moment-corrected test, which, to our knowledge, has not been done before. 


As mentioned above, the second contribution of this research concerns the 
interpretation of RMSEA when the observed variables are categorical. Given a sufficiently large 
sample size, the presence of any amount of model error (e.g., MacCallum & Tucker, 1991) will 
lead to a proposed model being rejected by an overall GOF statistic, such as C,. In the SEM 
literature, this is commonly referred to as the sample size problem (Cudeck & Henly, 1991). In 
response to this problem, SEM researchers have, over the years, proposed various fit indices 
and developed interpretive guidelines for continuous normally-distributed outcomes. For 
example, with the RMSEA index, a value of less than .05 is indicative of “close-fit” (Browne & 
Cudeck, 1993). 

More recently, researchers have made efforts to adapt these indices and guidelines for 
use with categorical outcomes. Within the IRT framework, these indices are typically based on 
the limited-information M) statistic (Maydeu-Olivares & Joe, 2006). For example, Maydeu- 
Olivares (2013) developed a rationale for constructing an Mj,-based RMSEA. More recently, 
Maydeu-Olivares & Joe (2014) expanded on this line of research and proposed some cutoff 
criteria for approximate fit. Another example is provided by Lee and Cai (2012), which 
proposed an M,-based Tucker-Lewis Index (Tucker & Lewis, 1973). Within the SEM 
framework, these indices have typically been constructed from moment-corrected tests. 
Notwithstanding the specific framework, the interpretation of these indices has received much 
less attention for categorical data than for continuous data. To help address this issue, we 
examine how RMSEA is affected by the number of categories in the outcome variables. This 
choice is motivated by results reported in Cai and Monroe (2013), which suggest that RMSEA, 
in a sense, behaves differently depending on the number of categories of the outcome variables. 

This RMSEA behavior can conveniently be studied along with C, due to the 
underlying response process formulation of factor analytic measurement models (Thurstone, 
1925; Thurstone, 1927; Lord, 1952) assumed under multistage estimation. The underlying 
response process provides a direct connection between structural models of continuous and 
categorical data, which can be utilized in the following way. 

First, given some form of introducing model error, a population correlation matrix of 


continuous variables can be created. For a chosen (working) model and discrepancy function 


(e.g., the maximum likelihood discrepancy function; Browne & Arminger, 1995), minimization 
of the function for the population correlation matrix yields a population discrepancy function 
value and derived population RMSEA. Next, underlying response variables can be randomly 
sampled from this population matrix to create datasets of continuous variables. In accordance 
with the underlying response variable formulation, these continuous variables may be 
discretized to generate categorical datasets. All of the datasets contain both model error, 
because of the nonzero population RMSEA, as well as sampling error. However, for the 
categorical datasets, the discretization itself does not introduce additional model error, 
assuming correct distributional specification of the underlying response process variables (e.g., 
multivariate normal). With a sufficiently large number of Monte Carlo replications, the 
sampling error may be averaged out. Then, the RMSEA estimates may be directly compared to 
the uniquely defined population RMSEA. We believe that the simulation results may shed 
some light on how RMSEA should be practically interpreted for SEM of categorical data. 

The rest of the paper is organized as follows. Section 2 presents a motivating example. 
Section 3 presents a structural model for ordinal data and the multistage estimator. Also, 
established fit statistics for the multistage estimator are introduced. Then, in Section 4, limited- 
information testing methodology is presented and the C, statistic is introduced. Section 5 
presents a simulation study for C, and the results. Section 6 explores the behavior of RMSEA, 
using the results from Section 5. Then, an empirical application of the proposed methods is 
given in Section 7. Finally, a conclusion and discussion of further research directions are 
provided in Section 8. 
2 A Running Example 

The Program for International Student Assessment (PISA; OECD, 2005) administers a 
student questionnaire containing various schooling and background related variables. One of 
these topics, surveyed in 2003, is students’ perceptions of their own mathematical aptitude. 
Table 1 presents the 12 items hypothesized to represent three distinct but correlated constructs. 
These constructs are positive self-concept as a mathematics student (PSC), mathematics anxiety 
(ANX), and task-specific confidence (TASK). Each of the 12 items has a 4-point response scale. 
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For PSC and ANX, the options are “strongly disagree,” “disagree,” “agree,” and “strongly 
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agree.” For TASK, the options are “not at all confident,” “not very confident,” “confident,” and 
“very confident.” 


Insert Table 1 about here 


One of the reasons PISA administers the student questionnaire is to allow researchers 
to explore how school and student characteristics relate to achievement outcomes. As an 
example, consider the full mediation model (see, e.g., Finch, West, & MacKinnon, 1997) shown 
in Figure 1. While this model is merely illustrative, it is similar to those studied by substantive 
researchers (see, e.g., Meece, Eccles, & Wigfield, 1990). In the model, ANX is regressed on PSC. 
Further, TASK is regressed on both ANX and PSC. This ordinal structural model could be 
estimated by the multistage estimator, at which point a researcher would typically need to 


examine its fit to data. 


Insert Figure 1 about here 
3 A Structural Equation Model for Ordered Categorical Responses 
3.1 The Data and the Model 
Let there be i = 1,...,N respondents and j = 1,...,n variables. Let y; be ann x 1 vector 
of continuous underlying response variables. It is typically assumed that y; is multivariate 
normal, that is, yj ~ N;,,(0,P) where P is ann x n correlation matrix. The d, = n(n — 1)/2 
unique correlations are stacked and collected in the d, x 1 vector p. 

It is assumed that a p x 1 vector of latent factors is related to y* via a factor analytic 
measurement model. For the ith case, this may be represented as y; = An; + €;. Further, the 
structural relationships among the latent variables is assumed to take the form 9; = a + Bn; + 
¢;. In the above equations, the unique factors in € and the disturbance terms in ¢ have zero 
means. Their covariance matrices are W and ®, respectively. Assuming that € and ¢ are 
orthogonal, the covariance structure for y* is 

cov(y*) = AA®A'A’ + B (1) 
where A = (I, — B)~’ is invertible and I, isa p x p identity matrix. To identify the model, it is 
generally necessary to set diag(W) = diag(I, — AA®A’A’). This identification condition implies 


that cov(y*) = P is a correlation matrix. 


By the underlying response process formulation, the continuous y; are not observed. 
Instead, the n x 1 vector of observed categorical variables y; result from the discretization of y;. 
To facilitate the presentation, we assume that all observed variable have the same number of 
categories, K. Then, for each variable, there are K — 1 thresholds, 1, ...,T,¢-1. In all, there are 
d, = n(K — 1) thresholds, which can collected into a d, x 1 vector t. Finally, y;j and yj; are 
related via the thresholds where 

yip=k, iftin< Vij < tyes (2) 
with T.9 = —9, Tx = ©. 
3.2 Multistage Estimation and Testing 

Multistage estimation begins by obtaining an estimate of the (polychoric) correlations in 
p. In practice, this is often accomplished in two steps. First, the thresholds are estimated by 
maximum likelihood, one item at a time, yielding tf. Next, treating T as fixed, the bivariate 
correlations are estimated by maximum likelihood, one pair of items at a time. This yields a 
vector of estimated polychoric correlations, p. To facilitate the presentation, we assume that no 
constraints are imposed on the thresholds. Then, the free structural parameters (e.g., factor 
loadings and latent regression coefficients) can be estimated by minimizing a weighted least 
squares (WLS) function of the polychoric correlation residuals. Formally, let the q free 
parameters be collected in the vector 0, and let p(@) represent the model-implied correlations. 
Then, the estimator @ is obtained by minimizing 

F(8;W) = (p — p(@)) W( — p(O)), (3) 
where W is a positive definite weight matrix. 

Next, we consider the form of the weight matrix W. Let V be a consistent estimate of 
the asymptotic covariance matrix of p. Further, let D = diag(V) be a diagonal matrix. The most 
common choices for W in Equation (3) are as follows. Choosing W = V7? results in the full 
weighted least squares estimator (WLS, Muthén, 1978). Choosing W = D~! results in the 
diagonally weighted least squares estimator (DWLS, Muthén, du Toit, & Spisic, 1997). Finally, 
choosing W = I results in the unweighted least squares estimator (ULS, Muthén, 1993). While 
theoretically important, WLS is not often used in practice as it tends to perform poorly unless N 


is very large. Under correct model specification and standard regularity conditions, the 


multistage estimator is VN-consistent and asymptotically normal (Jéreskog, 1994; Lee, Poon, & 
Bentler, 1995; Muthén & Satorra, 1995). 

In this research, only ULS and DWLS are used to estimate ordinal structural models. 
Accordingly, let 6y and @p be the vectors of parameter estimates obtained using ULS and 
DWLS, respectively. Similarly, let Fy and Fp be the respective minimized discrepancy function 
values. Such a discrepancy function value, F, can be used to construct an overall GOF statistic, 
T =NxF. However, for ULS and DWLS, T is not chi-square distributed even under correct 
model specification (Browne, 1984). But, as suggested by Muthén (1993), moment corrections 
may be applied to T to construct a test statistic that is approximately chi-square distributed. 
These moment corrections are analogous to those used in the continuous data case (Satorra & 
Bentler, 1994). While several adjustments have been proposed, this research utilizes the 
correction of Asparouhov and Muthén (2010), which is denoted by 7. An advantage of T is that 
it scales T so that the resulting statistic is approximately chi-square distributed with the 
“natural” degrees of freedom (i.e., the difference between the numbers of parameters in the 
saturated and estimated models). The use of ULS and DWLS to calculate T yields Ty and Tp, 
respectively. 

4 Limited-Information Testing Methodology 

While test statistics based on quadratic forms in the correlational residuals in p — p(@) 
have proven useful in evaluating the fit of ordinal structural models, these statistics were not 
specifically developed for categorical data and contingency tables. In a certain sense, these 
statistics may be regarded as afterthoughts, developed as the result of fitting categorical data 
into a factor-analytic framework largely dominated by continuous outcomes. On the other hand, 
recent years have seen a number of limited-information statistics specifically developed for 
latent variable models with categorical outcomes. Generally, these statistics are quadratic forms 
in linear functions of multinomial cell residuals from the n-way contingency tables formed by 
the cross-tabulations of the observed responses. Some examples are M, (Maydeu-Olivares & Joe, 
2006), M3 (Cai & Hansen, 2013), and Cz (Cai & Monroe, 2014). We have chosen to apply and 
study the C, statistic in this research, as it has theoretical and practical advantages over both M, 


and Mj (Cai & Monroe, 2014). The presentation here focuses on the application of C2 to the 
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ordinal structural model with multistage estimation. Readers interested in a more technical 
account of C3, or its application to IRT models, are referred to Cai and Monroe (2014). 
4.1 Full-Information and Limited-Information Test Statistics 

Returning to the structure of the data, recall that K is the number of response 
categories per item. In total, there are k = K” possible response patterns, which increases 
rapidly with K and/or n. For example, for the PISA model introduced in Section 2, « = 4** > 16 
million. Let the x x 1 vector p collect the x sample proportions. Similarly, let 1(@) collect the 
k model-implied response pattern probabilities. Then, let e = p — 1(@) be the cell residuals. 
Assuming the model is correctly specified in the population and given a vector of true 
parameters 9, let the true model-implied probabilities be 1) = 7(@9). In this case, the 
observed data may be considered to be a sample of size N from a multinomial with x categories. 

One approach to testing structural models for categorical data is to use a full- 
information test which directly uses the full set of multinomial residuals. Pearson’s X* is one 
such test, and is defined as X* = N Yi_,[p; — 1; (6)]” /™;(). When a fully-efficient estimator, 
such as maximum-likelihood, is used to obtain 8, and the model is correctly specified in the 
population, X* is approximately chi-square distributed with « — q — 1 degrees of freedom. 
Despite this asymptotic result, X* is not generally useful for testing structural models for 
categorical data, for several reasons. First, for large values of k, some model-implied 
probabilities must necessarily be near-zero. In the literature, this is often referred to as 
sparseness of the contingency table. Under sparseness, the Type I error rates and power of X* 
are both adversely affected (e.g., Bartholomew and Leung, 2002). An accompanying problem is 
computational. For large K and/or n, k may be so large that calculating X? becomes 
computationally impractical. Recall that « > 16 million for the PISA model, with only 12 
variables. Finally, in fitting structural models to categorical data, estimators that are not fully- 
efficient, such as the multistage estimator, are frequently used. In this case, X* will not follow 
its nominal chi-square distribution with x — q — 1 degrees of freedom. 

Another, more appealing, approach is provided by limited-information tests. 
Generally, these tests are quadratic forms that depend on lower-order sample proportions and 


model-implied probabilities. Different limited-information tests can be distinguished by: 1) 
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which lower-order proportions and probabilities are used; 2) how the proportions and 
probabilities are combined; and 3) how the distribution of the test is approximated. Here, we 
focus on first and second-order proportions and probabilities to summarize the categorical data, 
which is akin to using means and covariances to summarize continuous data. 

For a single variable, there are only K — 1 independent probabilities as the K cells 
must sum to 1. Conveniently, a set of independent cells can be obtained by removing any cell 
with category code k = 0. Then, let p and 7r(@) be the vectors of length s; = n(K — 1) = d,, 
consisting of all linearly independent first-order marginal probabilities and proportions, 
respectively. Let @é = p — 7(@) be the vector of linearly independent first-order residual 
probabilities. 

For a pair of variables, there are (K — 1)? independent second-order marginal 
proportions or model-implied probabilities upon knowing the first-order margins. Again, an 
independent set may be obtained by removing any cell in the K x K two-way table where either 
category code is 0. Then, let p and 7(@) be the vectors of length sz = n(n — 1)/2 x (K — 1)? = 
d,(K — 1)? of all linearly independent second-order proportions and model-implied 
probabilities, respectively. And, let € = p — 7(@) be the vector of all linearly independent 
second-order residual probabilities. 

With these definitions, we now explain how limited-information tests may be more 
easily applied than full-information tests. While first and second-order sub-tables can still be 
affected by sparseness, these tables are necessarily better-filled than the entire n-way 
contingency table with x cells. Consequently, limited-information tests are less vulnerable to 
the sparseness issue that affects the utility of full-information tests. Additionally, limited- 
information tests are potentially less computationally burdensome than full-information tests. 
For example, the number of first and second-order probabilities (s, and s2, respectively) may be 
much smaller than x. For the PISA model, s; = 36 and sz = 594, while x > 16 million. Finally, 
limited-information tests do not require a fully-efficient estimator. Instead, they only require 
consistency and asymptotic normality (Maydeu-Olivares and Joe, 2006), which are properties 


enjoyed by numerous estimators for structural models of categorical data, including the 
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multistage, pairwise likelihood (Katsikatsou, Moustaki, Yang-Wallentin, & Joreskog, 2012), and 
polychoric instrumental variable (Bollen & Maydeu-Olivares, 2007) estimators. 
4.2 Three Limited-Information Test Statistics 

The limited-information test of Maydeu-Olivares (2006) is noteworthy due to its 
application to structural models of categorical data. For convenience, let M denote this statistic. 
M is an unweighted sum of squares of the second-order residual probabilities in é. The 
distribution of M can be approximated by moment-matching (Satorra & Bentler, 1994). 

The Mp statistic (Maydeu-Olivares & Joe, 2006) is noteworthy here for at least two 
reasons. First, M2, and C2 have analogous structures, which will be presented below. Second, 
Mz has been widely-applied in IRT modeling and is available in commercial IRT software (e.g., 
flexMIRT®, Cai, 2013). Like M, M, uses the second-order residual probabilities in é, but it also 
incorporates the first-order residual probabilities in é. Let e, = (é’, é’)' be the vector of length 
S = S$; + Sz that collects all linearly independent first and second-order residual probabilities. 
Then, M, can be defined as 

Mz = N@7'Q2€2, (4) 

where 

QO, = 87! — E71A2(AQE7*A2) *ALED", (5) 
and all matrices are evaluated at 6. In Equation (5), £2 is the asymptotic covariance matrix of 
the first and second-order sample proportions, and A, is the matrix of derivatives of the first 
and second-order model-implied probabilities with respect to the vector of parameter estimates, 
6. In words, Mj is a quadratic form in the first and second-order residual probabilities. The 
matrix of the quadratic form, 22, weights these residual probabilities so that M2 is 
asymptotically chi-square distributed with s — q degrees of freedom (Maydeu-Olivares & Joe, 
2006). 

While M and M; are more robust to sparseness than full-information statistics, they can 
still be affected by the issue when the number of variable categories is large. As explained by 
Cai and Hansen (2012), this is because for some pairs of variables, certain response 
combinations are highly unlikely. For example, with the PISA survey, a student is unlikely to 


respond “strongly agree” to the item, “I learn mathematics quickly,” while also responding 
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“strongly disagree” to the item, “In my mathematics class, I understand even the most difficult 
work.” As shown in Cai and Hansen (2012), this sparseness in the K x K two-way table can 
negatively impact the Type I error rates and power of M,. Additionally, when both K and the 
number of variables are relatively large (i.e., when s2 is very large), it can become 
computationally burdensome to calculate, store, and manipulate all of the second-order 
residual probabilities in é, the derivatives, and the even larger number of elements in the 
weight matrix. 

C, addresses these issues by collapsing each K x K two-way table of residuals into a 
single residual moment. This is facilitated by using the ordered category codes k = 0,...,K — 1, 
as the raw scores. Let @)m,k),k, be the second-order marginal residual probability for variables 1 
and m in categories k; and k», respectively. The residual moment for variables | and m is given 


by the weighted sum 


K-1 K-1 (6) 
Tm = yy YY kik mE mky km * 
k 


1=1 km=1 
In words, 7, sums all of the second-order residual probabilities for variables |! and m, weighted 
by the product of the two corresponding category codes. These second-order marginal residual 
moments can be collected into a vector 7 = (721,731,---»%n-1) of dimension sz = n(n — 1)/2 = 
d,. Then, let the vector rz = (é€’,7’)', with dimension d = s; + s3, collect all of the linearly 
independent first-order marginal residual probabilities as well as the collapsed second-order 
marginal residual moments. 
Then, Cz is a quadratic form in rz, defined as 
C= NFs'UETs, (7) 

where 

Uz = 22" — 22 "I2a22 Je) 222", (8) 
and all matrices are evaluated at 6. The construction of C, parallels that of M, with F, 
replacing @2, and corresponding changes made in the weight matrix U2. That is, in Equation (8), 
X, is the asymptotic covariance matrix of the first and collapsed second-order sample 
proportions, and J, is the matrix of derivatives of the first-order and collapsed second-order 


model-implied probabilities with respect to the vector of parameter estimates, 6. The matrix of 
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the quadratic form, U2, weights the residual probabilities and moments so that C2 is 
asymptotically chi-square distributed with d — q degrees of freedom (Cai & Monroe, 2014). 
4.3 Technical Details for Cz 

A derivation of C2, and its application to IRT, is given in Cai and Monroe (2014). We 
refer interested readers to that report. However, the application of C, to structural models of 
categorical data in this research necessitates the presentation of certain technical topics, which 
are contained in the Appendix. 

These topics include: 1) satisfaction of regularity conditions by the multistage estimator; 
2) calculation of model-implied probabilities; and 3) calculation of the derivatives of the first 
and second-order model-implied probabilities with respect to the vector of parameter estimates. 
5 Simulation Study for C, 

A simulation study was conducted to compare the C, statistic with the traditional Ty 
and Tp statistics in terms of Type I error rates and power. The sample sizes considered were 
N = 100, 200, 500, and 1000. The form of the generating structural model was identical to the 
theorized mediation model presented in Figure 1. Referring to the notation presented earlier, 
the latent variables PSC, ANX, and TASK can be considered 71, 72, and 3 respectively. The 
true structural parameters in B were $2; = 0.3, 63; = 0.4, and B32 = 0.36, values used in Finch 
et al. (1997). 
5.1 Design: Data Generation 
For the null condition, a population correlation matrix, Po, was calculated via Equation 

(1), using the factor loadings and unique variances shown in Table 2. For each of 500 
replications, y; ~ N;,(0, Po) were sampled to form a dataset of continuous underlying variables. 
Let Y* be this dataset. Then, Y* was discretized to yield three categorical datasets, ¥“?, for 
K = 2,4, and 6. For a given replication, the categorical datasets are “nested” in the following 
sense. First, Y* was discretized using 5 thresholds per variable to yield Y). Next, a random 
subset of the thresholds, fixed over replications, was used to create yy), Finally, a further 
random subset of the thresholds, fixed over replications, was used to create Y®). The 


thresholds and subsets are presented in Table 2. 
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Insert Table 2 about here 


To study the power of C2, we used the steps just detailed, but introduced model error 
when generating the population correlation matrices. Specifically, structural model error was 
introduced using a variation of the Cudeck and Browne (1992) procedure. Given a choice of 
discrepancy function, the Cudeck and Browne (1992) procedure produces a correlation matrix 
with a prespecified discrepancy function value. To be consistent with the choice of estimator 
for the simulated categorical datasets, we chose the ordinary least squares discrepancy function. 
And, in a slight variation of the original procedure, we specified an exact population RMSEA 
value instead of the discrepancy function value as the former is more familiar. Let €9 be this 
value, where the asterisk emphasizes that the definition is at the level of the continuous 
underlying response variables, y*. The chosen values for €g were .01, .05, and .10. For 
continuous normally distributed outcomes, these values are often considered cutoffs for 


WM 


“excellent,” “close,” and “mediocre” fit, respectively (see, e.g., Browne & Cudeck, 1993), though 
alternative cutoff values exist (e.g., Hu & Bentler, 1999). An example population correlation 


matrix for the €5 = .10 model is shown in Table 3. 


Insert Table 3 about here 


5.2 Design: Estimation and Collected Statistics 

For each simulated data set, the mediation model shown in Figure 1 was estimated 
twice in Mplus (Muthén & Muthén, 2010), once with ULS and once with DWLS. These two 
model fittings yielded Ty and Tp, respectively. The ULS parameter estimates were then used 
along with the replication's dataset to obtain the C, statistic. To the extent that the ULS and 
DWLS point estimates differ, the resulting C2 values will also differ. However, we found this 
difference to be negligible and choose to report only the ULS-based C3. 

Solutions were checked to see if they were proper. Solutions were deemed improper if 
the estimated error variance was negative for any variable. These replications were discarded 


and not included in the results. Collected statistics include the proportion of properly 
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converged replications and rejection rates at common alpha levels. For all test statistics, the 
empirical mean and variance were recorded. Also, for the null condition, two-sided 
Kolmogorov-Smirnov (K-S) tests were conducted. 

After collecting and examining the results, it became clear that the results for DWLS 
and Tp were very similar to those for ULS and Ty. Thus, we only report the latter results. 
5.3 Results: Null Condition 

Table 4 presents the results for the null condition of the simulation study. As expected, 
the proportion of valid replications increases with N and K. For instance, whereas the 
proportion of valid replications for N = 100 and K = 2 is 0.71, for N = 1000 and K = 4, all 
replications converged properly. Generally, the calibration of the statistics also improves with 
increases in N and K. For the N = 100 and K = 2 condition, neither statistic is well-calibrated, 
as measured by the K-S p-values. This conclusion is supported by the Type I error rates, which 
differ substantially from the nominal values. We can also compare the empirical means and 
variances of Cz and Ty to the mean (df) and variance (2df) of the reference chi-square (df = 51). 
For this condition, the empirical distributions of C, and Ty appear stochastically smaller than 
the reference. On the other hand, for the largest sample size (NV = 1000) and K = 6 condition, 


both statistics appear well-calibrated, as evidenced by the Type I error rates and K-S p-values. 


Insert Table 4 about here 


Examining Table 4 more closely, C; appears to be better-calibrated than Ty at smaller 
sample sizes or with smaller K. At N = 100, C, appears reasonably well-calibrated for both 
K = 4and K = 6, as evidenced by the non-significant p-values (.099 and .169, respectively) and 
Type I error rates that approximately track the nominal levels. In contrast, Ty has significant p- 
values for these conditions (.016 and .003, respectively). Turning to K = 2, at N = 200, C2 again 
appears better-calibrated than Ty, as the latter statistic clearly under-rejects the null hypothesis. 

In summary, there are conditions, particularly with small N or small K, where C3 is 
well-calibrated, while Ty is not. However, there are no conditions where Ty is well-calibrated, 


while Cz is not. Thus, C; appears to be slightly better calibrated than Ty. 
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5.4 Results: Power 

Table 5 presents empirical rejection rates at the a = .05 level when model error is 
introduced via €9. The cells shaded in gray correspond to conditions under the null where the 
K-S p-values were significant. Since the significant p-values suggest the statistic may not be 
well-calibrated, care should be taken in interpreting these rejection rates. If we limit our 
evaluation to the non-shaded cells, then it is clear that C, is generally more powerful than Ty. In 
many cases, the difference in power is quite small. And, at the highest values of ¢g and N, both 
statistics have power at or near 1.0 and cannot be distinguished. However, in other cases, such 
as €9 = .05, N = 500, and K = 4, the difference in rejection rates is substantial (.820 and .570, for 
C2 and Ty, respectively). Also, because C2 appears generally better-calibrated than Ty, there are 
conditions where the rejection rate for C, may be the only meaningful result. Based on Table 5, 
C2 has more power than Ty in detecting the model error introduced via the Cudeck and Browne 


(1992) procedure. 


Insert Table 5 about here 


As mentioned earlier, in practice, with a sufficiently large sample size and any amount 
of model error, the proposed model will be rejected by an overall test, such as Ty or C3. In this 
event, practitioners routinely examine fit indices, such as RMSEA, to assess the approximate fit 
of the model. Given our simulation procedure, one RMSEA, which is based on jure may be 
obtained using the Mplus output. However, an alternative RMSEA, based on C2, may also be 
calculated. In the next section, we compare these two RMSEA estimates, and investigate how 
they are affected by the number of variable categories. 

6 The Relationship Between RMSEA and Number of Categories 

This Section uses the simulation results of Section 5 to study RMSEA for structural 
models of categorical data. To study power in Section 5, structural model error was introduced 
with a specified population RMSEA value, denoted by €5. Again, the chosen values for €5 were 
01, .05, and .10. Let £) be the sample RMSEA estimate for ¥“), where &) may be based on 


either Ty or C2. Then, the interpretation of RMSEA for categorical data may be studied in two 
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ways. First, for a given simulation condition, the é“) may be averaged over the 500 replications 
for the sampling error to become negligible. Let &*) be such an average. Then, &) may be 
directly compared to €5, with discrepancies suggesting that the population RMSEA values for 
the continuous underlying response variables and the discretized categorical variables are not 
the same. Second, for each Y*, the 6“ values for the nested datasets may be compared to one 
another. Any systematic relationship that holds across the Monte Carlo replications would also 
be of interest. 


The RMSEA estimate é“) was obtained by 


(Ee) ) 


where T is either C, or Ty, and df is the corresponding degrees of freedom. For each of the 500 
replications, the mean RMSEA values and empirical 5th and 95th percentiles were recorded. 

Figure 2 displays the means and empirical 90% confidence intervals for selected 
simulation conditions. Results corresponding to the N = 100 sample size have been omitted, as 
they are quite similar to the N = 200 sample size. A number of trends in Figure 2 are 
noteworthy. Overall, # based on C; is greater than the corresponding & based on Ty. This 
is expected, as C, is generally the more powerful statistic. Also, as expected, the sampling 
variability of &) decreases for larger N, as evidenced by the shorter line segments spanning the 
90% confidence intervals. Note, however, that for any given €9 and K, the é' (KX) values are 
relatively stable across the various sample sizes. 


Insert Figure 2 about here 


For the ¢§ = .01 conditions (the top row of plots in Figure 2), the #™ values do not 
appear to depend on K. Further, all of the & estimates are near ¢5 = .01, and for all N and K, 
the 90% empirical confidence interval of 6 spans ¢. For the ¢9 = .05 conditions (the middle 
row of plots in Figure 2), the pattern of results is quite different. There is a clear dependence on 
K, with & increasing in K. Also, for all N and K, &®) < ¢§ = .05. And, for the largest sample 
size, the 90% empirical confidence intervals of €“ do not span ¢§. Finally, the pattern of results 


for the €5 = .10 conditions (the bottom row of plots in Figure 2) is quite similar to that of the 
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£9 = .05 conditions. Again, “) clearly increases with K, and is always less than ¢5 = .10 for the 
studied conditions. 

Figure 3 presents results from another perspective, focusing on the “nested” nature of 
the datasets for the N = 1000 and €9 = .10 condition. That is, Figure 3 gives a more detailed 
look at the results corresponding to the lower-right plot in Figure 2. For each replication, there 
isa € value for K = 2, 4, and 6. Further, an RMSEA estimate can be computed upon fitting 
the structural model to the underlying continuous data for the replication because we have 
access to them in a simulation. Denote this estimate as é". Figure 3 shows the relationship 
among these various RMSEA estimates (based on C, for the categorical data and ordinary least 


squares for the continuous underlying response data). 


Insert Figure 3 about here 


For this condition, from Figure 2, we know that =) increases with K. However, 
Figure 3 makes clear that, for this condition, the RMSEA estimates for “nested” datasets are 
positively correlated. An implication of Figure 3 is that for a dataset from this condition, any 
decrease in the number of categories will likely result in a smaller RMSEA estimate. For other 
conditions, though, the various RMSEA estimates may be more weakly correlated. Factors that 
influence the strength of the relationships include the magnitudes of N (since a smaller N leads 
to increased sampling variability) and €9 (since RMSEA is bounded below by 0). Finally, Figure 
3 illustrates that with the continuous underlying variables (y-axes for top row of plots), the é* 
values estimate €9 with little bias because the distribution appears to center on the true RMSEA 
value. In this case, the empirical mean is .099, very close to €5 = .10. 

From Figures 2 and 3, it is clear that €“) is a poor estimate of ej. As one extreme 
example, consider the K = 2 and N = 1,000 condition, when €9 = .10. In this case, &@2) = 034 
for C2 and .027 for Ty. Based on these large discrepancies, we reason that é“ is approximating 
a different population value, due to the discretization process. Let et) be such a value. To the 


extent that € is a reasonable estimate for gO it is clear that gl # €). Also, for relatively 


large values of €  (e.g., .05 or .10), et is always less than €9. Further, for such conditions, el) 
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appears to converge towards € 9 as K increases, though the convergence is slow. Greater values 
of K (e.g., 10) would be helpful in exploring this apparent convergence. However, such high 
values are not common in empirical data and were not included in the simulation. In any case, 
Figures 2 and 3 suggest that the guidelines developed for RMSEA interpretation using 
continuous data may not be applicable for use with categorical data. 

Also, €), and presumably gi clearly depends on the underlying test statistic, C2 or 
T,. For the studied conditions, é‘*) based on C, is a less biased estimate of ¢§. In other words, 
the C,-based RMSEA for the categorical datasets is generally a better estimate of the population 
RMSEA defined at the level of the continuous data. In summary, even when the population 
RMSEA for the continuous underlying response variables is fixed, the estimated value of 
RMSEA for categorical variables depends on a number of things, including the discrepancy 
function, number of categories per variable, and the choice of underlying test statistic. 
7 Empirical Application 

In this section, we apply C2 to the PISA example presented in Section 2. We also 
calculate the RMSEA estimates and discuss their interpretation in light of the simulation study 
results. Only a random subset (N = 1000 complete cases) of the United States school sample is 
used. For this illustration, we ignore the complex sampling design of the survey, though it 
would need to be modeled for proper inference. As opposed to the goal of producing valid 
substantive findings, our goals here are to demonstrate the utility of Cz in assessing a structural 


model of real data, and highlight the challenges in interpreting RMSEA for such models. 


Insert Table 6 about here 


The model was fitted twice in Mplus, once using ULS and once with DWLS. The 
overall model fit statistics and select fit indices are presented in Table 6. For all of the test 
statistics (i.e., the ULS-based C2, Ty and Tp), p < .001. The large sample size (NV = 1000) may be 
an issue in the use of the chi-square test statistics. Turning to the RMSEA estimates, the C,- 
based estimate (.036) is less than either the Ty or Tp-based estimates (.041 and .054). This is not 


inconsistent with the simulation study results, where the C,-based RMSEA estimates were only 
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greater than the Ty or Tp-based RMSEA estimates on average and certainly can be smaller on 
occasion. Also, it is possible that Ty and Tp are more powerful than C2 against certain types of 
model error. Applying conventional guidelines for RMSEA interpretation, the observed 
estimates are all near the .05 cut-off of “close-fit.” In particular, the upper-bound of the 90% 
confidence interval for the C,-based estimate is .044, which lends further support to the position 
that the theorized mediation model is close-fitting. However, the results from the simulation 
study suggest that guidelines developed for use with continuous data may be less applicable for 
categorical data. More specifically, for models with K = 4, the conventional guidelines may be 
too lenient. Examining Figure 2 again, we may have reason to believe that in the categorical case, 
with K = 4, the RMSEA estimates are smaller by about 20-30% than in the continuous case. 
Consequently, at least for C,, perhaps the cut-off between “close” and “not close” should be 
around .03 as opposed to .05. This, however, is merely a conjecture as opposed to any sort of 
suggested guideline. 
8 Discussion and Conclusion 

In this research, limited-information testing principles, heretofore primarily applied 
in the context of IRT, were applied to SEM of ordinal data. Specifically, the C, statistic proposed 
in Cai and Monroe (2014) was compared to test statistics based on quadratic forms in polychoric 
correlation residuals. C2 was shown to perform at least as well as the competing statistics in 
terms of calibration under the null as well as power. For some conditions, C, clearly 
outperformed the other statistics. This research also took the opportunity presented by the 
simulation study to examine the behavior of the RMSEA fit index under varying conditions. 
While guidelines for RMSEA interpretation of continuous variables have been developed over 
many years, the use of RMSEA for assessing fit of categorical variables is a much more recent 
phenomenon. The simulation results suggest that the magnitude of RMSEA estimates is 
surprisingly dependent on the number of variable categories. 

While we believe this research has contributed to the area of model fit assessment for 
categorical SEM, it has also left many questions unanswered. Regarding the C, statistic, it is 
unknown how C, will perform under other conditions. Notably, C, should be studied with 


larger models, as the simulation study in this research focused on a relatively small model (with 
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only 12 variables). Also, it would be interesting to study C2 when the underlying continuous 
variables are not normal. Presumably, C2 would have more power to detect this sort of 
misspecification than statistics that assume multivariate normality of the underlying response 
variables. Additionally, the statistic itself can be further developed for structural models for 
categorical data. Under multistage estimation, the sample proportions can be perfectly 
reproduced by the threshold estimates, leading to all first-order residual probabilities being 
equal to zero. In this case, perhaps C2, and other limited-information statistics, can be 
simplified. 

As for the interpretation of RMSEA for categorical data, a number of questions 
deserve further study. Again, since the simulation study only used one model size, it is unclear 
to what extent model size will impact the behavior of RMSEA. Additionally, while the Browne 
and Cudeck (1992) procedure proved convenient in this research as a method of introducing 
model error, other forms of model misspecification (e.g., omitted cross-loadings) could elicit 
different behaviors of RMSEA. Also, given how RMSEA appears to depend on the number of 
categories in the outcome variables, to what extent can corrections or adjustments to RMSEA 
make the fit index easier to interpret or more useful? Finally, RMSEA is but one fit index. It 
would stand to reason that other statistics based on chi-square approximations (e.g., TLI) may 
exhibit interesting behaviors. In any case, both the current research and potential future 
research topics reinforce the notion that practitioners should exercise caution in interpreting fit 
index values (see, e.g., Marsh, Hau, & Wen, 2004). In closing, while this research has 
contributed to the understanding of model fit assessment for categorical data, much work 


remains. 
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Appendix 


Regularity Conditions for the Multistage Estimator 


Maydeu-Olivares and Joe (2006) assumed regularity conditions on the model that must 
be satisfied for application of the limited-information testing methodology. There must be a 
matrix H such that 
VN(6 — 6) = HVN(p— 0), (10) 


denotes asymptotic equivalence. Maydeu-Olivares and Joe (2006) presented H for 


a ” 


where 
the maximum likelihood estimator. Here, H is presented for the multistage estimator. 
Essentially, the approach taken here is to piece together results from Maydeu-Olivares (2006), 
which also considers asymptotic properties of the multistage estimator. 

Let A= dy(@)/00' bead x q matrix. Recall that W is the d x d matrix used in the 
third stage of estimation. Then, let M = (A'wA) *A’w be aq Xx d matrix. The estimates of the 
structural parameters may be expressed as a linear function of the estimates from the first and 
second stages, 

VN(@- 0) £ MVNGP-y), (11) 
which is Equation (18) in Maydeu-Olivares (2006). The d x sz matrix G, defined in Equation 
(14) of Maydeu-Olivares (2006), is used to account for the first and second stages of estimation. 
Then, the estimates of the structural parameters may be expressed as a linear function of the 
underlying sample proportions and probabilities, 

VN(6 — 6) = MGLVN(p — 7), (12) 
where L is an S2 X K operator matrix (see, e.g., Cai and Hansen, 2013) such that é = Le. Taking 
H = MGL satisfies the requirements for the multistage estimator. 

Model-Implied Probabilities 

Calculation of rz requires first and second-order model-implied probabilities. The 
covariance matrix in Equation (8), Z2, requires first, second, third, and fourth-order model- 
implied probabilities. Details of the pattern of model-implied probabilities necessary for Z2 can 
be found in Cai and Hansen (2013). According to the model, we can find the marginal 


probability of any subset of v variables as 
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- (13) 
Pr () NS 7 | a | dy (y"; 0, Py)dy* 
j=l 
where $(-) denotes a v-variate normal density and P is a v-dimensional parallelepiped region of 
integration given by P =@j=1 (Try Ti kj+1)- The correlation matrix P, is the v x v sub-matrix 
from P. The regions of integration obviously depend on the thresholds f, and the correlations 
between the underlying variables depend on other free parameters of 6, according to Equation 
(1). If v =n, Equation (13) provides the marginal probability of an entire response pattern. And 
for v < n, Equation (13) can be used with any subset of the items to find marginal probabilities 
of any order as needed. For this research, we calculated Equation (13) for up to fourth-order 
probabilities using the Monte Carlo approach presented in Genz (1992). Though observed 
proportions could be substituted for the probabilities, these would likely prove unstable, in 
particular for smaller sample sizes. 
Derivatives of the First and Second-Order Model-Implied Probabilities 
The weight matrix of C2 in Equation (8), Uz, depends on Jz. Instead of focusing on the 
elements of Jz, it is sufficient to focus on the elements of Az, as Jz = TA, for an appropriate 
operator matrix T. A, is the matrix of derivatives of first and second-order model-implied 
probabilities with respect to 8. Without loss of generality of the method, we make two 
simplifying assumptions for ease of exposition. Namely, we assume that there are no 
additional constraints placed on the free parameters, and that the thresholds are saturated, i.e., 
the model contains as many location parameters as there are thresholds. Following our 
notational convention, 12 (6) = (7¢(@) , 7#(6)')'. It is also convenient to partition the 
components of @ in the following way. Again, assuming saturated thresholds, let 0, be those 
parameters that model 7, and let @, be those parameters that model p (free parameters in A, B, 


etc.). Then, 6 = (0,',0,')', and A, may be partitioned as 


an(6) a7(6) (14) 
= | 00, 98, 
An =] ,./5 By | 

a(@) di(@) 

00, 06, 


As the first-order moments do not depend on correlations, the upper-right block of A) is 0. 
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Maydeu-Olivares (2006, Appendix 2) presents results for the upper-left and lower-left blocks of 
A,. In the same Appendix 2, results are given for ai(0)/dp. By the chain rule, the lower-right 
block may be obtained as the product of 07¢(6)/dp and 0p/00 p- Thus, the elements of 0p/00, 


are needed, which are standard results in the SEM literature (Bock & Bargmann, 1966). 
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Table 1 


Accepted in Multivariate Behavioral Research 


Prompts and Item Wording for the PISA Empirical Example 


Construct/ 
Item 


Stem/ 
Wording 


PSC 
1 


How much do you disagree or agree with the following statements? 

I get good <marks> in mathematics. 

I learn mathematics quickly. 

I have always believed that mathematics is one of my best subjects. 
In my mathematics class, I understand even the most difficult work. 


How much do you disagree or agree with the following statements? 

I often worry that it will be difficult for me in mathematics class. 
I get very tense when I have to do mathematics homework. 

I get very nervous doing mathematics problems. 

I feel helpless when doing a mathematics problem. 


How confident do you feel about having to do the following calculations? 

Using a <train timetable>, how long it would take to get from Zedville to Zedtown 
Calculating how many square metres of tiles you need to cover a floor 

Finding the actual distance between two places on a map with a 1:10,000 scale 
Calculating the petrol consumption rate of a car 


Note. PSC = 


positive self-concept as a mathematics student. ANX = mathematics anxiety. TASK 


= task-specific confidence 


Table 2 
Simulation Study: True Generating Parameters 


Variable) tj. 2 3 ta Ts Aa Ar As Wy 
1 -1.27 -069 -0.28 0.28 119 0.70 O O O51 
2 -111 -0.71 -0.07 0.36 0.73 0.73 O 0 0A7 
3 -0.74 -0.39 -0.03 0.24 115 073 O 0 OA7 
4 -115 -0.26 0.06 0.66 120 069 O O 0.52 
5 0.64 -018 0.21 057 094 0 065 0 0.54 
6 -117 -054 -0.23 047 115 0 073 0 042 
7 -115 -045 -0.17 0.18 0.74 0 073 0 042 
8 -1.07 -0.38 0.07 0.55 109 0 067 0 0.51 
9 -0.80 -0.45 -0.07 0.22 052 0 0 062 0.47 
10 -1.02 -0.26 0.12 046 1.06 0 0 0.68 0.36 
11 -1.11 -047 040 0.76 119 0 0 0.76 0.20 
12 -1.07 -018 0.10 0.37 110 0 O 0.61 0.48 


Note. For K = 6 categories, Tj m is the mth ordered threshold for variable j. For K = 4, the 
subset of thresholds is in boldface. For K = 2, the further subset of thresholds is also italicized. 
Aj,» is the loading of the jth variable on the pth factor. w;,; is unique variance j. 


Table 3 


Population Correlation Matrices for Correctly Specified Model (Lower Triangle) and Model 
with €5 = .10 (Upper Triangle) 


Item 1 Z 3 + 5 6 7 8 9 10 11 12 


1 1.0000 532 459 455 .219 099 116 121) .266) = 6.254) =6.303 231 
2 .511 1.000 569 479 164 164 1117 184 255 8.147 8.265) 185 
3 511 533 1000 549 094 162 147 206 141 213 290 .218 
4 483 504 504 1.000 .185 .064 .219 130 .226) 6 ©.200) 6.249 .213 
5.137 142) «142.135 1.000) 3.473.515.5510 292.270) 2004S 157 
6 .153 160 160 151 517 1.000 .645 541 .220 298 § .262 8§ 6.294 
7 2153 160° 4160: 151 <517 ..581 2.000.. .469 (252° .268- .232 337 
8 141 147 147 139 475 8.533 )=«.533 «1.000 3.284) = 220) 295.184 
9 208 217 217 205 219 246 §=©.246)§=©—.226 1.000 568 694 436 
10.228) 238) 238) 225, 240) = 270) 270) 248) 586 «61.000 .721 .628 
11.2550 266.266.252.269) 302. 302) .277)—Ss .655—S «719 1.000 ~—- 646 
12.205) 214.214.202.216 242242 222.526) S577 ~— 645 1.000 


Table 4 


Simulation Results: Null Condition 


Rejection Rates 


K N Stat Reps Mean Var 01 .05 10 K-S 
2 100 GC 71 48.6 102.9 020 .034 070 < .001 
Ps val 50.2 60.5 006 025 ~—-.053 < .001 
200 Cy 88 50.1 95.0 009 034 ~~ .066 371 
ie 88 50.3 63.2 007 .016 ~=—-.052 001 
500 «C2 98 51.5 113.6 012 059 ~=—-.128 348 
T; 98 51.5 95.0 012 .053 ~—-.108 490 
1000 Cp 1.00 50.8 116.2 012 064 114 526 
Ty 1.00 51.4 98.6 006 .062 ~.100 885 
4 100 © 97 51.9 97.0 010 052.115 099 
Ts 97 51.2 68.6 008 027. 054 013 
200 Cy 1.00 51.5 99.6 014 064 ~=.102 459 
Tj 1.00 50.8 75.7 002 028 074 090 
500 Cy 1.00 51.4 114.3 014 064 116 260 
Ty 1.00 51.3 96.8 012 054 108 561 
1000 Cy 1.00 51.4 109.6 018 .052 ~=.102 667 
Ty 1.00 51.2 95.9 010 .054 .082 762 
6 100 CG 99 51.9 96.3 010 062 .123 169 
Ti 99 51.4 69.1 006 036 ~=.073 002 
200 «Cy 1.00 51.6 107.1 016 074 106 632 
Ty 1.00 51.0 86.1 010 .050 .096 183 
500 Cy 1.00 51.3 108.7 008 064 112 516 
Ty 1.00 51.2 104.0 016 .050  .108 976 
1000 Cy 1.00 51.5 105.1 014 056 ~—-.108 699 
Ty 1.00 51.2 94.3 014 054 ~—-.090 430 


Note. K is the number of categories per variable. ‘Reps’ is the proportion of valid replications. 
‘K-S’ is the two-sided Kolmogorov-Smirnov p-value. The degrees of freedom for the model is 


Bl. 
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Table 5 


Simulation Results: Power at a = .05 Level 


N = 100 N = 200 N = 500 N = 1000 
€& Stat K=2 K=4 K=6 K=2 K=4 K=6 K=2 K=4 K=6 K=2 K=4 K=6 
01 033 .064 056 062 062 074 .065 .080 .080 092 .116 .108 
Ty 014 023 032 025 .036 .050 053 072 .060 .070 .078 084 
05 Cz 057 .176 219 .118 364 450 .185 .820 .920 476 .996 1.000 
Ty .027 .068 081 055 .173 212 .140 070 692 308 958 .986 
10 © 085 .708 884 241 982 .998 794, 1.000 1.000 .996 1.000 1.000 
Ty 041 298 400 115 Lor 886 498 1.000 1.000 .910 1.000 1.000 


Note. €g is population RMSEA. K is the number of categories per variable. 
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Table 6 


PISA Data Example: Test Statistics and Select Fit Indices 


Stat df Value p-value TLI RMSEA 90% CI 

C2 51 116.61 <.001 .997 .036 (.027, .044) 
Ty eH 138.30 <.001 .989 .041 (.033, .050) 
Tp 51 199.42 <.001  .992 .054 (.046, .062) 


Note. “TLI’ = Tucker-Lewis Index. ‘90% CI’ = 90% confidence interval for the RMSEA estimate. 


Figure Captions 


Figure 1. Ordinal Structural Model for PISA Example 

Circles represent latent variables. PSC = positive self-concept as a mathematics student. ANX = 
mathematics anxiety. TASK = task-specific confidence. B = regression weight. ¢ = equation disturbances. 
Squares represent observed variables. € = unique factors. 


Figure 2. Mean and Empirical 90% Confidence Intervals for RMSEA Estimates Based on Cz and Ty 
For each row of plots, the dashed line marks the value of €9. K is number of categories per variable. N is 
sample size. 


Figure 3. Bivariate Plots of RMSEA Estimates for “Nested” Datasets when €5 = 0.10 and N = 1000 
RMSEA estimates based on C3. For each plot, each point represents 1 of 500 Monte Carlo replications. 
The axes labels (K) indicate the number of categories per variable in the dataset. In the top row of plots, 
y* indicates continuous data. Dotted lines mark .05. Dashed lines mark €5 = .10. 
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